• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) Doc2Vec°ú Word2VecÀ» È°¿ëÇÑ Convolutional Neural Network ±â¹Ý Çѱ¹¾î ½Å¹® ±â»ç ºÐ·ù
¿µ¹®Á¦¸ñ(English Title) Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec
ÀúÀÚ(Author) ±èµµ¿ì   ±¸¸í¿Ï   Dowoo Kim   Myoung-Wan Koo  
¿ø¹®¼ö·Ïó(Citation) VOL 44 NO. 07 PP. 0742 ~ 0747 (2017. 07)
Çѱ۳»¿ë
(Korean Abstract)
º» ³í¹®¿¡¼­´Â ¹®ÀåÀÇ ºÐ·ù¿¡ ÀÖ¾î ¼º´ÉÀÌ ÀÔÁõµÈ word2vecÀ» È°¿ëÇÑ Convolutional Neural Network(CNN) ¸ðµ¨À» ±â¹ÝÀ¸·Î ÇÏ¿© ¹®¼­ ºÐ·ù¿¡ Àû¿ë ½Ã ¼º´ÉÀ» Çâ»ó½ÃÅ°±â À§ÇØ doc2vecÀ» ÇÔ²² CNN¿¡ Àû¿ëÇÏ°í ±â¹Ý ¸ðµ¨ÀÇ ±¸Á¶¸¦ °³¼±ÇÑ ¹®¼­ ºÐ·ù ¹æ¾ÈÀ» Á¦¾ÈÇÑ´Ù. ¸ÕÀú ÅäÅ«È­ ¹æ¹ýÀ» ¼±Á¤Çϱâ À§ÇÑ Ãʺ¸ÀûÀÎ ½ÇÇèÀ» ÅëÇÏ¿©, ¾îÀý ´ÜÀ§, ÇüÅÂ¼Ò ºÐ¼®, Word Piece Model(WPM) Àû¿ëÀÇ 3°¡Áö ¹æ¹ý ÁßWPMÀÌ ºÐ·ùÀ² 79.5%¸¦ »êÃâÇÏ¿© ¹®¼­ ºÐ·ù¿¡ À¯¿ëÇÔÀ» ½ÇÁõÀûÀ¸·Î È®ÀÎÇÏ¿´´Ù. ´ÙÀ½À¸·Î WPMÀ» È°¿ëÇÏ¿© »ý¼ºÇÑ ´Ü¾î ¹× ¹®¼­ÀÇ º¤ÅÍ Ç¥ÇöÀ» ±â¹Ý ¸ðµ¨°ú Á¦¾È ¸ðµ¨¿¡ ÀÔ·ÂÇÏ¿© ¹üÁÖ 10°³ÀÇ Çѱ¹¾î ½Å¹® ±â»ç ºÐ·ù¿¡ Àû¿ëÇÑ ½ÇÇèÀ» ¼öÇàÇÏ¿´´Ù. ½ÇÇè °á°ú, Á¦¾È ¸ðµ¨ÀÌ ºÐ·ùÀ² 89.88%¸¦ »êÃâÇÏ¿© ±â¹Ý ¸ðµ¨ÀÇ ºÐ·ùÀ² 86.89%º¸´Ù 2.99% Çâ»óµÇ°í 22.80%ÀÇ °³¼± È¿°ú¸¦ º¸¿´´Ù. º» ¿¬±¸¸¦ ÅëÇÏ¿©, doc2vecÀÌ µ¿ÀÏÇÑ ¹üÁÖ¿¡ ¼ÓÇÑ ¹®¼­µé¿¡ ´ëÇÏ¿© À¯»çÇÑ ¹®¼­ º¤ÅÍ Ç¥ÇöÀ» »ý¼ºÇϱ⠶§¹®¿¡ ¹®¼­ÀÇ ºÐ·ù¿¡ doc2vecÀ» ÇÔ²² È°¿ëÇÏ´Â °ÍÀÌ È¿°úÀûÀÓÀ» °ËÁõÇÏ¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
In this paper, we propose a novel approach to improve the performance of the Convolutional Neural Network(CNN) word embedding model on top of word2vec with the result of performing like doc2vec in conducting a document classification task. The Word Piece Model(WPM) is empirically proven to outperform other tokenization methods such as the phrase unit, a part-of-speech tagger with substantial experimental evidence (classification rate: 79.5%). Further, we conducted an experiment to classify ten categories of news articles written in Korean by feeding words and document vectors generated by an application of WPM to the baseline and the proposed model. From the results of the experiment, we report the model we proposed showed a higher classification rate (89.88%) than its counterpart model (86.89%), achieving a 22.80% improvement. Throughout this research, it is demonstrated that applying doc2vec in the document classification task yields more effective results because doc2vec generates similar document vector representation for documents belonging to the same category.
Å°¿öµå(Keyword) doc2vec   word2vec   word piece model(wpm)   convolutional neural network(cnn)  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå